1 Setup

Ensure that the following packages are installed: knitr.

2 Introduction to this Project

The purpose of this project is to gain experience in genome-wide analyses of differentially expressed genes using permutation distributions and the False Discovery Rate (FDR).

This project follows the first example provided in Chapter 16, “Two-Color Case Studies,” of the Limma User Guide: Linear Models for Microarray and RNA-Seq Data User’s Guide, specifically section 16.1: “Swirl Zebrafish: A Single-Group Experiment.” The example is augmented by generating some additional plots.

The Limma User Guide describes this data as follows:

Background. The experiment was carried out using zebrafish as a model organism to study the early development in vertebrates. Swirl is a point mutant in the BMP2 gene that affects the dorsal/ventral body axis. The main goal of the Swirl experiment is to identify genes with altered expression in the Swirl mutant compared to wild-type zebrafish.

The hybridizations. Two sets of dye-swap experiments were performed making a total of four replicate hybridizations. Each of the arrays compares RNA from swirl fish with RNA from normal (“wild type”) fish.”

3 Download, Read In, and Inspect Swirl Data

We consulted the Limma User Guide (pages 76 and 77) to identify the “targets” file containing metadata about the samples hybridized to the four arrays in this experiment. The name of the file was added to the following code chunk, in the call to readTargets. This parametrized the subsequent call to read.maimages, which used filenames for image data from each microarray, returning a Limma RGlist object.

setwd("~/Desktop/swirl")
library(limma)
targets <- readTargets("SwirlSample.txt")
RG <- read.maimages(targets, source="spot")
## Read swirl.1.spot 
## Read swirl.2.spot 
## Read swirl.3.spot 
## Read swirl.4.spot
RG
## An object of class "RGList"
## $R
##        swirl.1   swirl.2   swirl.3    swirl.4
## [1,] 19538.470 16138.720 2895.1600 14054.5400
## [2,] 23619.820 17247.670 2976.6230 20112.2600
## [3,] 21579.950 17317.150 2735.6190 12945.8500
## [4,]  8905.143  6794.381  318.9524   524.0476
## [5,]  8676.095  6043.542  780.6667   304.6190
## 8443 more rows ...
## 
## $G
##        swirl.1   swirl.2   swirl.3    swirl.4
## [1,] 22028.260 19278.770 2727.5600 19930.6500
## [2,] 25613.200 21438.960 2787.0330 25426.5800
## [3,] 22652.390 20386.470 2419.8810 16225.9500
## [4,]  8929.286  6677.619  383.2381   786.9048
## [5,]  8746.476  6576.292  901.0000   468.0476
## 8443 more rows ...
## 
## $Rb
##      swirl.1 swirl.2 swirl.3 swirl.4
## [1,]     174     136      82      48
## [2,]     174     133      82      48
## [3,]     174     133      76      48
## [4,]     163     105      61      48
## [5,]     140     105      61      49
## 8443 more rows ...
## 
## $Gb
##      swirl.1 swirl.2 swirl.3 swirl.4
## [1,]     182     175      86      97
## [2,]     171     183      86      85
## [3,]     153     183      86      85
## [4,]     153     142      71      87
## [5,]     153     142      71      87
## 8443 more rows ...
## 
## $targets
##         SlideNumber     FileName       Cy3       Cy5      Date
## swirl.1          81 swirl.1.spot     swirl wild type 2001/9/20
## swirl.2          82 swirl.2.spot wild type     swirl 2001/9/20
## swirl.3          93 swirl.3.spot     swirl wild type 2001/11/8
## swirl.4          94 swirl.4.spot wild type     swirl 2001/11/8
## 
## $source
## [1] "spot"

Similarly, on page 78, we identified the “GAL” file containing metadata about the genes spotted on the arrays. The file name was included in the following code chunk, in the call to readGAL.

setwd("~/Desktop/swirl")
RG$genes <- readGAL("fish.gal")
RG$printer <- getLayout(RG$genes)

4 Create Image Plots

First, we created image plots for the background intensities of the first array, swirl.1.

imageplot(log2(RG$Rb[,1]), RG$printer, low="white", high="red")

imageplot(log2(RG$Gb[,1]), RG$printer, low="white", high="green")

Next, we visualized how this background variation affects M-values computed for the first array. Note that no normalization method is applied when computing the M and A values here.

MA_raw <- normalizeWithinArrays(RG, method="none")
imageplot(MA_raw$M[,1], RG$printer, zlim=c(-3,3))

Below is an interpretation of the result, as described on page 80 of the Limma User Guide:

The imageplot() function lies the slide on its side, so the first print-tip group is bottom left in this plot. We can see a red streak across the middle two grids of the 3rd row caused by a scratch or dust on the array. Spots which are affected by this artifact will have suspect M-values. The streak also shows up as darker regions in the background plots. The red streak seen on the image plot can be seen as a line of spots in the upper right of this plot.

5 Visualize the Unnormalized Data Using an MA-plot and Boxplots of M-values by Print-Tip Group

We created an MA-plot for the first array and MA-plots grouped by print-tip for the unnormalized data:

plotMD(MA_raw, column = 1) ## column = 1 is the default
abline(0,0,col="blue")

plotPrintTipLoess(MA_raw, array = 1)  ## array = 1 is the default

We also generated boxplots for all arrays to visualize the raw M-values without normalization:

library(ggokabeito)
boxplot(MA_raw$M[,1]~MA_raw$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Raw M vs Print-Tip for swirl.1")

boxplot(MA_raw$M[,2]~MA_raw$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Raw M vs Print-Tip for swirl.2")

boxplot(MA_raw$M[,3]~MA_raw$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Raw M vs Print-Tip for swirl.3")

boxplot(MA_raw$M[,4]~MA_raw$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Raw M vs Print-Tip for swirl.4")

6 Apply Default “print-tip” Normalization and Visualize M-values

We normalized within arrays using the default “printtiploess” method and visualized the normalized M-values.

MA_print_tip <- normalizeWithinArrays(RG,method="printtiploess") 
# ^^^ method="printtiploess" is the default
plotMD(MA_print_tip)
abline(0,0,col="blue")

plotPrintTipLoess(MA_print_tip)

boxplot(MA_print_tip$M[,1]~MA_print_tip$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess) vs Print-Tip for swirl.1")

boxplot(MA_print_tip$M[,2]~MA_print_tip$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess) vs Print-Tip for swirl.2")

boxplot(MA_print_tip$M[,3]~MA_print_tip$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess) vs Print-Tip for swirl.3")

boxplot(MA_print_tip$M[,4]~MA_print_tip$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess) vs Print-Tip for swirl.4")

7 Apply Scale Normalization Across Arrays

Following the Limma User Guide’s recommendation, we applied scale normalization across arrays and visualized the results.

MA_print_tip_scale <- normalizeBetweenArrays(MA_print_tip,method="scale")
plotMD(MA_print_tip_scale)
abline(0,0,col="blue")

plotPrintTipLoess(MA_print_tip_scale)

boxplot(MA_print_tip_scale$M[,1]~MA_print_tip_scale$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess + scale) vs Print-Tip for swirl.1")

boxplot(MA_print_tip_scale$M[,2]~MA_print_tip_scale$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess + scale) vs Print-Tip for swirl.2")

boxplot(MA_print_tip_scale$M[,3]~MA_print_tip_scale$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normalized M (printtiploess + scale)vs Print-Tip for swirl.3")

boxplot(MA_print_tip_scale$M[,4]~MA_print_tip_scale$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normalized M (printtiploess + scale) vs Print-Tip for swirl.4")

8 Fit the Linear Model

design <- modelMatrix(targets, ref="wild type")
## Found unique target names:
##  swirl wild type
design
##         swirl
## swirl.1    -1
## swirl.2     1
## swirl.3    -1
## swirl.4     1
fit_print_tip_scale <- lmFit(MA_print_tip_scale,design)

9 Apply Empirical Bayes Moderation and Visualize Results

ebayes_print_tip_scale <- eBayes(fit_print_tip_scale) ## the current function is eBayes
qqt(ebayes_print_tip_scale$t,
    df=ebayes_print_tip_scale$df.prior+ebayes_print_tip_scale$df.residual,pch=16,cex=0.2)

plotMD(ebayes_print_tip_scale)
abline(0,0,col="blue")
top30 <- order(ebayes_print_tip_scale$lods,decreasing=TRUE)[1:30]
text(ebayes_print_tip_scale$Amean[top30],ebayes_print_tip_scale$coef[top30],labels=ebayes_print_tip_scale$genes[top30,"Name"],cex=0.8,col="blue")

volcanoplot(ebayes_print_tip_scale,ylab="Surprisal (-logP)")

ordinary.t <- fit_print_tip_scale$coef / fit_print_tip_scale$stdev.unscaled / fit_print_tip_scale$sigma
head(ordinary.t)
##        swirl
## 1 -2.4751770
## 2 -2.1079491
## 3 -1.8683988
## 4 -1.3537954
## 5 -2.3612873
## 6  0.8435477
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
tibble(moderated_t=unname(ebayes_print_tip_scale$t),ordinary_t=unname(ordinary.t)) |> mutate(shrink=(abs(moderated_t) < abs(ordinary_t))) |> ggplot(aes(x=ordinary_t,y=moderated_t,color=shrink)) + geom_point(alpha=0.5) + labs(title="Effect of Limma Moderation on Gene-Wise t") +  scale_color_okabe_ito()

10 Identify Interesting Genes

options(digits=3)
topTable(ebayes_print_tip_scale,number=30)
##      Block Row Column      ID   Name logFC AveExpr     t  P.Value adj.P.Val
## 3721     8   2      1 control   BMP2 -2.21    12.1 -21.1 1.03e-07  0.000357
## 1609     4   2      1 control   BMP2 -2.30    13.1 -20.3 1.34e-07  0.000357
## 3723     8   2      3 control   Dlx3 -2.18    13.3 -20.0 1.48e-07  0.000357
## 1611     4   2      3 control   Dlx3 -2.18    13.5 -19.6 1.69e-07  0.000357
## 8295    16  16     15 fb94h06 20-L12  1.27    12.0  14.1 1.74e-06  0.002067
## 7036    14   8      4 fb40h07  7-D14  1.35    13.8  13.5 2.29e-06  0.002067
## 515      1  22     11 fc22a09 27-E17  1.27    13.2  13.4 2.44e-06  0.002067
## 5075    10  14     11 fb85f09 18-G18  1.28    14.4  13.4 2.46e-06  0.002067
## 7307    14  19     11 fc10h09 24-H18  1.20    13.4  13.2 2.67e-06  0.002067
## 319      1  14      7 fb85a01  18-E1 -1.29    12.5 -13.1 2.91e-06  0.002067
## 2961     6  14      9 fb85d05 18-F10 -2.69    10.3 -13.0 3.04e-06  0.002067
## 4032     8  14     24 fb87d12 18-N24  1.27    14.2  12.8 3.28e-06  0.002067
## 6903    14   2     15 control    Vox -1.26    13.4 -12.8 3.35e-06  0.002067
## 4546     9  14     10 fb85e07 18-G13  1.23    14.2  12.8 3.42e-06  0.002067
## 683      2   7     11 fb37b09  6-E18  1.31    13.3  12.4 4.10e-06  0.002182
## 1697     4   5     17 fb26b10  3-I20  1.09    13.3  12.4 4.30e-06  0.002182
## 7491    15   5      3 fb24g06  3-D11  1.33    13.6  12.3 4.39e-06  0.002182
## 4188     8  21     12 fc18d12 26-F24 -1.25    12.1 -12.2 4.71e-06  0.002209
## 4380     9   7     12 fb37e11  6-G21  1.23    14.0  12.0 5.19e-06  0.002216
## 3726     8   2      6 control  fli-1 -1.32    10.3 -11.9 5.40e-06  0.002216
## 2679     6   2     15 control    Vox -1.25    13.4 -11.9 5.72e-06  0.002216
## 5931    12   6      3 fb32f06  5-C12 -1.10    13.0 -11.7 6.24e-06  0.002216
## 7602    15   9     18 fb50g12  9-L23  1.16    14.0  11.7 6.25e-06  0.002216
## 2151     5   2     15 control   vent -1.40    12.7 -11.7 6.30e-06  0.002216
## 3790     8   4     22 fb23d08  2-N16  1.16    12.5  11.6 6.57e-06  0.002221
## 7542    15   7      6 fb36g12  6-D23  1.12    13.5  11.0 9.23e-06  0.003000
## 4263     9   2     15 control   vent -1.41    12.7 -10.8 1.06e-05  0.003326
## 6375    13   2     15 control   vent -1.37    12.5 -10.5 1.33e-05  0.004026
## 1146     3   4     18 fb22a12  2-I23  1.05    13.7  10.2 1.57e-05  0.004242
## 157      1   7     13 fb38a01   6-I1 -1.82    10.8 -10.2 1.58e-05  0.004242
##         B
## 3721 7.96
## 1609 7.78
## 3723 7.71
## 1611 7.62
## 8295 5.78
## 7036 5.54
## 515  5.48
## 5075 5.48
## 7307 5.40
## 319  5.32
## 2961 5.29
## 4032 5.22
## 6903 5.20
## 4546 5.18
## 683  5.02
## 1697 4.97
## 7491 4.96
## 4188 4.89
## 4380 4.80
## 3726 4.76
## 2679 4.71
## 5931 4.63
## 7602 4.63
## 2151 4.62
## 3790 4.58
## 7542 4.27
## 4263 4.13
## 6375 3.91
## 1146 3.76
## 157  3.75
tt <- topTable(ebayes_print_tip_scale,n=Inf)
head(tt[tt$adj.P.Val <= 0.1,])
##      Block Row Column      ID   Name logFC AveExpr     t  P.Value adj.P.Val
## 3721     8   2      1 control   BMP2 -2.21    12.1 -21.1 1.03e-07  0.000357
## 1609     4   2      1 control   BMP2 -2.30    13.1 -20.3 1.34e-07  0.000357
## 3723     8   2      3 control   Dlx3 -2.18    13.3 -20.0 1.48e-07  0.000357
## 1611     4   2      3 control   Dlx3 -2.18    13.5 -19.6 1.69e-07  0.000357
## 8295    16  16     15 fb94h06 20-L12  1.27    12.0  14.1 1.74e-06  0.002067
## 7036    14   8      4 fb40h07  7-D14  1.35    13.8  13.5 2.29e-06  0.002067
##         B
## 3721 7.96
## 1609 7.78
## 3723 7.71
## 1611 7.62
## 8295 5.78
## 7036 5.54
dim(tt[tt$adj.P.Val <= 0.1,])
## [1] 277  11
dt <- decideTests(ebayes_print_tip_scale,p.value = 0.1)
summary(dt)
##        swirl
## Down     156
## NotSig  8171
## Up       121
plotMD(ebayes_print_tip_scale,status=dt)

11 Augment Results with FDR Estimation

We compared the number of differentially expressed genes (DEGs) identified by Limma to those obtained using FDR estimation methods, including pi0 estimation.

library(FDRestimation)
hist(ebayes_print_tip_scale$p.value)

ebayes.fit.fdr <- p.fdr(pvalues=ebayes_print_tip_scale$p.value,threshold = 0.1)
plot(ebayes.fit.fdr, main="Benjamini-Hochberg FDRs for Swirl")

plot(ebayes.fit.fdr,xlim = c(0,1500),ylim=c(0,0.2),main="Blowup for Benjamini-Hochberg FDRs for Swirl")

sum(ebayes.fit.fdr$fdrs <= 0.1)
## [1] 277
sum(ebayes.fit.fdr$`Results Matrix`$`BH FDRs`<= 0.1)
## [1] 277
sum(ebayes.fit.fdr$`Results Matrix`$`Adjusted p-values`<= 0.1)
## [1] 277
sum(ebayes.fit.fdr$`Reject Vector` == "Reject.H0")
## [1] 277
ebayes.fit.fdr.set.pi0 <- p.fdr(pvalues=ebayes_print_tip_scale$p.value,
                                set.pi0 = get.pi0(ebayes_print_tip_scale$p.value),threshold = 0.1)
plot(ebayes.fit.fdr.set.pi0,main="Benjamini-Hochberg FDRs for Swirl with pi0 estimation")

plot(ebayes.fit.fdr.set.pi0,xlim = c(0,1500),ylim=c(0,0.2),main="Blowup for Benjamini-Hochberg FDRs for Swirl with pi0 estimation")

sum(ebayes.fit.fdr.set.pi0$fdrs <= 0.1)
## [1] 516
sum(ebayes.fit.fdr.set.pi0$`Results Matrix`$`BH FDRs`<= 0.1)
## [1] 516
sum(ebayes.fit.fdr.set.pi0$`Results Matrix`$`Adjusted p-values`<= 0.1)
## [1] 277
sum(ebayes.fit.fdr.set.pi0$`Reject Vector` == "Reject.H0")
## [1] 277